Key-concept extraction from French articles with KX

نویسندگان

  • Sara Tonelli
  • Elena Cabrio
  • Emanuele Pianta
چکیده

We present an adaptation for the French text mining challenge (DEFT 2012) of the KX system for multilingual unsupervised key-concept extraction. KX carries out the selection of a list of weighted keywords from a document by combining basic linguistic annotations with simple statistical measures. In order to adapt it to the French language, a French morphological analyzer (PoS-Tagger) has been added into the extraction pipeline, to derive lexical patterns. Moreover, parameters such as frequency thresholds for collocation extraction and indicators for key-concepts relevance have been calculated and set on the training documents. In the DEFT 2012 tasks, KX achieved good results (i.e. 0.27 F1 for Task 1 with terminological list, and 0.19 F1 for Task 2) with a limited additional effort for domain and language adaptation. MOTS-CLÉS : Extraction de mots-clés, patrons linguistiques, terminologie.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

KX: A Flexible System for Keyphrase eXtraction

In this paper we present KX, a system for keyphrase extraction developed at FBK-IRST, which exploits basic linguistic annotation combined with simple statistical measures to select a list of weighted keywords from a document. The system is flexible in that it offers to the user the possibility of setting parameters such as frequency thresholds for collocation extraction and indicators for keyph...

متن کامل

Private Puncturable PRFs from Standard Lattice Assumptions

A puncturable pseudorandom function (PRF) has a master key k that enables one to evaluate the PRF at all points of the domain, and has a punctured key kx that enables one to evaluate the PRF at all points but one. The punctured key kx reveals no information about the value of the PRF at the punctured point x. Punctured PRFs play an important role in cryptography, especially in applications of i...

متن کامل

French Resources for Extraction and Normalization of Temporal Expressions with HeidelTime

In this paper, we describe the development of French resources for the extraction and normalization of temporal expressions with HeidelTime, a open-source multilingual, cross-domain temporal tagger. HeidelTime extracts temporal expressions from documents and normalizes them according to the TIMEX3 annotation standard. Several types of temporal expressions are extracted: dates, times, durations ...

متن کامل

Data extraction from machine-translated versus original language randomized trial reports: a comparative study

BACKGROUND Google Translate offers free Web-based translation, but it is unknown whether its translation accuracy is sufficient to use in systematic reviews to mitigate concerns about language bias. METHODS We compared data extraction from non-English language studies with extraction from translations by Google Translate of 10 studies in each of five languages (Chinese, French, German, Japane...

متن کامل

Knowledge discovery in bibliographic collections using concept hierarchies and visualization tools

This paper presents new methods for knowledge extraction and visualization, applied to datasets selected from the astronomical literature. One of the objectives is to detect correlations between concepts extracted from the documents. Concepts are generally meta-information which may be defined a priori, or may be extracted from the document contents and are organised along domain ontologies or ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012